Method and Apparatus for Distributing Multimedia to Remote Clients

ABSTRACT

Video and audio signals are streamed to remote viewers that are connected to a communication network. A host server receives an originating video and audio signal that may arrive from a single source or from a plurality of independent sources. The host server provides any combination of the originating video and audio signals to viewers connected to a communication network. A viewer requests the host server provide a combination of video and audio signals from the host server. The host server transmits an instruction set to be executed by the viewer. The instruction set causes the viewer to transmit parameters to the host user, including parameters relating to the processing capabilities of the viewer. The host server then transmits multimedia data to the viewer according to the received parameters. A plurality of viewers may be simultaneously connected to the host server. Each of the plurality of viewers may configure the received video and audio signals independent of any other viewer and may generate alerts based on the video and audio content.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of, and incorporates by reference inits entirety, U.S. patent application Ser. No. 11/515,255, filed on Sep.1, 2006 and patented U.S. application Ser. No. 09/652,113, filed on Aug.29, 2000, U.S. Pat. No. 7,103,668, patented on Sep. 5, 2006.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention relates to devices and systems for communicating over anetwork. More particularly, the invention relates to a method andapparatus for streaming a multimedia signal to remote viewers connectedto a communication network.

2. Description of the Related Art

The constantly increasing processing power available in hardware devicessuch as personal computers, personal digital assistants, wireless phonesand other consumer devices allows highly complex functions to beperformed within the device. The hardware devices can perform complexcalculations in order to implement functions such as spreadsheets, wordprocessing, database management, data input and data output. Commonforms of data output include video and audio output.

Personal computers, personal digital assistants and wireless phonescommonly incorporate displays and speakers in order to provide video andaudio output. A personal computer incorporates a monitor as the displayterminal. The monitor, or display, on most personal computers can beconfigured independently of the processor to allow varying levels ofresolution. The display for personal computers is typically capable ofvery high resolution, even on laptop-style computers,

In contrast, displays are permanently integrated into personal digitalassistants and wireless phones. An electronic device having a dedicateddisplay device formats data for display using dedicated hardware. Theprocessing capabilities of the hardware as well as the displaycapabilities limit the amount of information displayed and the qualityof the display to levels below that typically available from a personalcomputer. The lower quality is defined as fewer pixels per inch, theinability to display colors or a smaller viewing area.

A personal computer may integrate one of a number of hardware interfacesin order to display video output on a monitor. A modular video card or aset of video interface Integrated Circuits (IC's) is used by thepersonal computer to generate the digital signals required to generatean image on the monitor. The digital signals used by a computer monitordiffer from the analog composite video signal used in a televisionmonitor. However, the personal computer may incorporate dedicatedhardware, such as a video capture card, to translate analog compositevideo signals into the digital signals required to generate an image onthe monitor. Thus, the personal computer may display, on the monitor,video images captured using a video camera, or video images output froma video source such as a video tape recorder, digital video disk player,laser disk player, or cable television converter.

The video capture card, or equivalent hardware, also allows the personalcomputer to save individual video frames provided from a video source.The individual video frames may be saved in any file format recognizedas a standard for images. A common graphic image format is the JointPhotographic Experts Group (JPEG) format that is defined inInternational Organization for Standardization (ISO) standard ISO-10918titled DIGITAL COMPRESSION AND CODING OF CONTINUOUS-TONE STILL IMAGES.The JPEG standard allows a user the opportunity to specify the qualityof the stored image. The highest quality image results in the largestfile, and typically, a trade off is made between image quality and filesize. The personal computer can display a moving picture from acollection of JPEG encoded images by rapidly displaying the imagessequentially, in much the same way that the individual frames of a movieare sequenced to simulate moving pictures.

The volumes of data and image files generated within any individualpersonal computer provide limited utility unless the files can bedistributed. Files can be distributed among hardware devices inelectronic form through mechanical means, such as by saving a file ontoa portable medium and transferring the file from the portable medium(e.g., floppy disks) to another computer.

Such mechanical file transfers are not particularly efficient and may belimited by the capacity of the transfer medium. A more efficient methodof transferring files between computers is by using some type ofcommunication link. The most basic communication link is a hardwiredconnection between the two computers transferring information. However,information may also be transferred using a network of computers.

A computer may be connected to a local network where a number ofprocessors are linked together using dedicated communication links. Filetransfer speed on a dedicated network is typically constrained by thespeed of the communication hardware. The physical network is typicallyhardwired and capable of providing a large signal bandwidth.

More widespread remote networks may take advantage of existinginfrastructure in order to provide the communication link betweennetworked processors. One common configuration allows remote devices toconnect to a network using telephone land lines. The communication linkis the factor constraining data transfer speed where low bandwidthcommunication links such as telephone land lines are used as networkconnections.

One well known public network that allows a variety of simultaneouscommunication links is the Internet. As used herein, “Internet” refersto a network or combination of networks spanning any geographical area,such as a local area network, wide area network, regional network,national network, and/or global network. As used herein, “Internet” mayrefer to hardwire networks, wireless networks, or a combination ofhardwire and wireless networks. Hardwire networks may include, forexample, fiber optic lines, cable lines, ISDN lines, copper lines, etc.Wireless networks may include, for example, cellular systems, personalcommunication services (PCS) systems, satellite communication systems,packet radio systems, and mobile broadband systems.

Individual computers may connect to the Internet using communicationlinks having vastly differing information bandwidths. The fastestconnections to the network use fiber connections directly to the network“backbone”. Connections to the network having a lower informationbandwidth use E1 or T1 telephone line connections to a fiber link. Ofcourse, the cost of the communication link is proportional to theavailable information bandwidth.

Network connections are not limited to computers. Any hardware devicecapable of data communication may be connected to a network. Personaldigital assistants as well as wireless phones typically incorporate theability to connect to networks in order to exchange data. Hardwaredevices often incorporate the hardware or software required to allow thedevice to communicate over the Internet. Thus, the Internet operates asa network to allow data transfer between computers, network-enabledwireless phones, and personal digital assistants.

One potential use of networks is the transfer of graphic images andaudio data from a host to a number of remote viewers. As discussedabove, a computer can store a number of captured graphic images andaudio data within its memory. These files can then be distributed overthe network to any number of viewers. The host can provide a simulationof real-time video by capturing successive video frames from a source,digitizing the video signal, and providing access to the files. A viewercan then download and display the successive files. The viewer caneffectively display real-time streaming video where the host continuallycaptures, digitizes, and provides files based on a real-time videosource.

The distribution of captured real-time video signals over a networkpresents several problems. For example, there is no flexibility in thedistribution of files to various users. A host captures the video andaudio signals and generates files associated with each type of signal.As previously discussed, graphic images are commonly stored as JPEGencoded images. The use of JPEG encoding can compress the size of thegraphic image file but, depending on the graphic resolution selected bythe host, the image file may still be very large. The network connectionat the host is an initial bottleneck to efficient file transfer. If thehost sends files to the network using only a phone modem connection totransfer multiple megabyte files, no viewer will be able to display thevideo and audio signals in a manner resembling real-time streamingvideo.

The viewer's network connection becomes another data transfer bottleneckeven if the host can send files to the network instantaneously. A viewerwith a phone modem connection will not be able to transferhigh-resolution images at a speed sufficient to support real-timestreaming video.

One option is for the host to capture and encode any images in thelowest possible resolution to allow even the slowest connection to viewreal-time streaming video. However, the effect of capturinglow-resolution images to enable the most primitive system's access tothe images is to degrade the performance of a majority of viewers.Additionally, the images may need to be saved in such a low resolutionthat all detail is lost from the images. Degradation of the images,therefore, is not a plausible solution.

Another problem encountered is the inability of all users to support thesame graphical image format selected by the host. Most personalcomputers are able to support the JPEG image format; however,network-enabled wireless phones or personal digital assistants may notbe able to interpret the JPEG image format. Additionally, the lesssophisticated hardware devices may not incorporate color displays.Access to video images should be provided to these users as well.

Finally, in such video distribution systems, the viewer has no controlover the images. The viewer must rely solely on the host to provide aformatted and sized image having the proper view, resolution, and imagesettings. The viewer cannot adjust the image being displayed, the imageresolution, or the image settings such as brightness, contrast andcolor. Further, the viewer is unable to control such parameters ascompression of the transmitted data and the frame rate of videotransmission.

SUMMARY OF THE INVENTION

The present invention is directed to an apparatus and method oftransferring video and/or audio data to viewers such that the viewerscan effectively display real-time streaming video output and continuousaudio output. The apparatus and method may adapt the streaming video toeach viewer such that system performance is not degraded by the presenceof viewers having slow connections or by the presence of viewers havingdifferent hardware devices. The apparatus and method can further providea level of image control to the viewer where each viewer canindependently control the images received.

BRIEF DESCRIPTION OF THE DRAWINGS

The features, objectives, and advantages of the invention will becomeapparent from the detailed description set forth below when taken inconjunction with the drawings, wherein like parts are identified withlike reference numerals throughout, and wherein:

FIG. 1 is a block diagram of one embodiment of a multimedia distributionsystem.

FIG. 2 is an overview of the main program shown in FIG. 1.

FIG. 3 is a block diagram of a personal computer implementing the hostprocess.

FIG. 4A is a diagram illustrating the video capture module.

FIG. 4B is a flow chart illustrating the function of the switchingsystem.

FIG. 5A is a block diagram of a multimedia distribution module whereinthe host operates as a server.

FIG. 5B is a block diagram illustrating the broadcast of video data by aweb server.

FIG. 6 is a block diagram of a video stream format.

FIG. 7 is a block diagram of various video block formats.

FIG. 8 is a flow chart illustrating motion detection at a block level.

FIG. 9 is a flow chart illustrating motion detection at a frame level.

FIG. 10 is a flow chart illustrating a method of transmitting only thosevideo image blocks that change.

FIG. 11 is a block diagram of an audio stream format.

FIG. 12 is a flow chart illustrating the encoding and generation of anaudio frame.

FIG. 13 is a block diagram illustrating the broadcast of audio data by aweb server.

FIG. 14 is a flow chart illustrating the dynamic updating of the domainname system.

FIG. 15 is a block diagram of a system for mirroring audio and videodata.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

As used herein, a computer, including one or more computers comprising aweb server, may be any microprocessor- or processor-controlled device orsystem that permits access to a network, including terminal devices,such as personal computers, workstations, servers, clients, minicomputers, main-frame computers, laptop computers, a network ofindividual computers, mobile computers, palm-top computers, hand-heldcomputers, set top boxes for a television, interactive televisions,interactive kiosks, personal digital assistants, interactive wirelesscommunications devices, mobile browsers, or a combination thereof. Thecomputers may further possess input devices such as a keyboard, mouse,touchpad, joystick, pen-input-pad, and output devices such as a computerscreen and a speaker.

These computers may be uni-processor or multi-processor machines.Additionally, these computers include an addressable storage medium orcomputer accessible medium, such as random access memory (RAM), anelectronically erasable programmable read-only memory (EEPROM),programmable read-only memory (PROM), erasable programmable read-onlymemory (EPROM), hard disks, floppy disks, laser disk players, digitalvideo devices, compact disks, video tapes, audio tapes, magneticrecording tracks, electronic networks, and other techniques to transmitor store electronic content such as, by way of example, programs anddata. In one embodiment, the computers are equipped with a networkcommunication device such as a network interface card, a modem, or othernetwork connection device suitable for connecting to a networkedcommunication medium.

Furthermore, the computers execute an appropriate operating system suchas Linux, Unix, Microsoft® Windows®, Apple® MacOS®, and IBM® OS/2®. Asis convention, the appropriate operating system includes acommunications protocol implementation which handles all incoming andoutgoing message traffic passed over a network. In other embodiments,while different computers may employ different operating systems, theoperating system will continue to provide the appropriate communicationsprotocols necessary to establish communication links with a network.

The computers may advantageously contain program logic, or othersubstrate configuration representing data and instructions, which causethe computer to operate in a specific and predefined manner as describedherein. In one embodiment, the program logic may advantageously beimplemented as one or more modules.

As can be appreciated by one of ordinary skill in the art, each of themodules may comprise various sub-routines, procedures, definitionalstatements and macros. Each of the modules is typically separatelycompiled and linked into a single executable program. Therefore, thedescription of each of the modules in this disclosure is used forconvenience to describe the functionality of the preferred system. Thus,the processes that are performed by each of the modules may bearbitrarily redistributed to one of the other modules, combined togetherin a single module, or made available in, for example, a shareabledynamic link library.

The modules may advantageously be configured to reside on theaddressable storage medium and configured to execute on one or moreprocessors. The modules include, but are not limited to, software orhardware components which perform certain tasks. Thus, a module mayinclude, by way of example, components, such as, software components,object-oriented software components, class components and taskcomponents, processes, functions, attributes, procedures, subroutines,segments of program code, drivers, firmware, microcode, Java byte codes,circuitry, data, databases, data structures, tables, arrays, andvariables.

As used herein, multimedia refers to data in any form. For example, itmay include video frames, audio blocks, text data, or any other data orinformation. Multimedia information may include any individual form orany combination of the various forms.

A block diagram of a multimedia distribution system according to aspectsof the invention is shown in FIG. 1. The system is composed of a host 10interfaced through, for example, a network 20 to at least one client 30.The host 10 is a computer including one or more processes or modules andmay interface with various hardware devices on the computer. A processor module may be a set of instructions implemented in software, firmwareor hardware, including any type of programmed step undertaken bycomponents of the system. The client 30 is another computer includingone or more process or modules. Advantageously, the client 30 is aremote computer interconnected to the host 10 through a network 20. Thenetwork 20 is any type of communication network as is commonly known byone skilled in the field and as was described previously. The network 20may be a Local Area Network (LAN), a Wide Area Network (WAN), a publicnetwork such as the Internet, or a wireless network or any combinationof such networks. The network 20 interconnection between the host 10 andthe client 30 may be accomplished using hard wired lines or throughwireless Radio Frequency (RF) links. The various embodiments of theinvention are not limited by the interconnection method used in thenetwork 20 or the physical location of the host 10 or clients 30.

A number of processes operate within the host 10 in order to allow thehost 10 to interface with external devices 80 and with the client 30through the network 20. One or more capture devices 42 interface withexternal devices 80 in order to transform the data provided by anexternal device 80 into a format usable by the host 10. In oneembodiment, the capture device 42 is a video capture card thatinterfaces to an external video source. The video source may begenerated by a video camera, video disc player, video cassette recorder,television video output, or any other device capable of generating avideo source. The video capture card grabs the frames from the videosource, converts them to digital signals, and formats the digitalsignals into a format usable by the host 10. The external device 80 mayalso be a video card within a computer for converting video signals thatare routed to a monitor into a format usable by the host 10.

The external devices 80 are not limited to video sources and can includedevices or sources of data of interest. For example, the externaldevices 80 may generate audio data. The capture device 42 interfaceswith an audio source to convert the input signal to a digital signal,then to convert the digital signals into a format usable by the host 10.A variety of external devices 80 may be used to provide an audio signal.An audio signal may be provided from a microphone, a radio, a compactdisc player, television audio output, or any other audio source.

Multiple external devices 80 may interface with the host 10. Theexternal devices 80 may provide inputs to the host 10 simultaneously,sequentially, or in some combination. A switcher module 44 is used wherethere is a controllable switch (not shown) that is used to multiplexsignals from multiple sources to a single capture device 42. Theswitcher 44 is used where multiple sources are controlled and is omittedif the host 10 does not have control over the selection of the source.If used, the switcher 44 receives control information through acommunication port on the computer. An exemplary embodiment of ahardware switch used to multiplex multiple video sources to a singlevideo capture card is provided in copending U.S. patent application Ser.No. 09/439,853, filed Nov. 12, 1999, entitled SIGNAL SWITCHING DEVICEAND METHOD, assigned to the assignee of the current application, andhereby incorporated herein by reference. A similar hardware switch maybe used to multiplex multiple audio sources to a single audio capturecard.

A multimedia operating system module 49 allows the capture devices tointerface with one or more capture modules 40 a, 40 b. The capturemodules 40 a, 40 b monitor the capture devices and respond to requestsfor images by transmitting the captured information in JPEG-encodedformat, for example, to the main program module 46.

The host also includes a web server module 50, such as the Apache webserver available from the Apache Software Foundation. The web server 50is used to configure the host 10 as a web server. The web server 50interfaces the host 10 with the various clients 30 through the network20. The web server 50 sets up an initial connection to the client 30following a client request. One or more Common Gateway Interfaces (CGI)52 a, 52 b are launched for each client 30 by the web server module 50.Each CGI 52 submits periodic requests to the main program 46 for updatedvideo frames or audio blocks. The web server 50 also configures thededicated CGI 52 adapted to the capabilities of each client 30. Theclient 30 may monitor the connection and maintains some control over theinformation sent through the CGI 52. The client 30 can cause the webserver 50 to launch a “set param” CGI module 54 to change connectionparameters. The web server 50 conveys the control information to theother host processes through the “set param” CGI 54. Once the web server50 establishes the network connection, the CGI 52 controls theinformation flow to the client 30.

The client 30 interfaces to the host through the network 20 using aninterface module such as a browser 32. Commercially available browsersinclude Netscape Navigator and Microsoft's Internet Explorer. Thebrowser 32 implements the communication formatting and protocolnecessary for communication over the network 20. The client 30 istypically capable of two-way communications with the host 10. Thetwo-way link allows the client 30 to send information as well as receiveinformation. A TCP/IP socket operating system module 59 running on thehost 10 allows the host to establish sockets for communication betweenthe host 10 and the client 30.

The host 10 may also incorporate other modules not directly allocated toestablishing communications to the client 30. For example, an IP PROC 60may be included within the host 10 when the host 10 is configured tooperate over, for example, the Internet. The IP PROC 60 is used tocommunicate the host's 10 Internet Protocol (IP) address. The IP PROC 60is particularly useful when the host's IP address is dynamic and changeseach time the host 10 initially connects to the network 20. In oneembodiment, the IP PROC 60 at the host 10 works in conjunction with aDomain Name System (DNS) host server 90 (described in further detailbelow with reference to FIG. 14) connected to the network to allowclients 30 to locate and establish a connection to the host 10 eventhough the host 10 has a dynamic IP address.

An overview of the main program module 46 is provided in FIG. 2. Thehost implements a user interface 204 to receive input from the userthrough, for example, a keyboard or a mouse and to provide display andaudio output to the user. The output may be in the form of an operatingwindow displayed on a monitor that provides the user with an imagedisplay and corresponding control menus that can be accessed using akeyboard, a mouse or other user interface devices. A scheduler 210operates simultaneously with the user interface 204 to control theoperation of various modules. The user or an administrator of the hostsystem may set up the scheduling of multimedia capture using thescheduler 210. Images or audio may be captured over particular timewindows under the control of the scheduler 210 and those time windowscan be selected or set by a user.

A licensing module 214 is used to either provide or deny the user accessto specific features within the system. As is described in detail below,many features may be included in the system. The modularized design ofthe features allows independent control over user access to eachfeature. Independent control over user access allows the system to betailored to the specific user's needs. A user can initially set up theminimum configuration required to support the basic system requirementsand then later upgrade to additional features to provide systemenhancements. Software licensing control allows the user access toadditional features without requiring the user to install a new softwareversion with the addition of each enhancement.

The host also performs subsystem control processes 220. The hostoversees all of the subsystem processes that are integrated into themultimedia distribution system. These sub-processes include themultimedia capture system 230 that controls the capture of the video andaudio images and the processing and formatting of the captured data.There may be numerous independent CGI processes running simultaneouslydepending on the number of clients connected to the host and the host'scapacity. Each of the CGI processes accesses the network and providesoutput to the clients depending on the available captured data and thecapabilities of the client.

A motion detection 240 process operates on the captured images to allowdetection of motion over a sequence of the captured images. Motiondetection can be performed on the entire image or may be limited to onlya portion of the image. The operation of motion detection will bediscussed in detail later.

Another process is an event response 250. The event response 250 processallows a number of predefined events to be configured as triggeringevents. In addition to motion detection, the triggering event may be thepassage of time, detection of audio, a particular instant in time, userinput, or any other event that the host process can detect. Thetriggering events cause a response to be generated. The particularresponse is configurable and may include generation and transmission ofan email message, generation of an audio alert, capture and storage of aseries of images or audio, execution of a particular routine, or anyother configurable response or combination of responses.

Additional processes include an FTP process 260 and an IP Updaterprocess 270. As discussed with reference to FIG. 1, the FTP processtransfers the multimedia data to an FTP server to allow widespreadaccess to the data. The IP Updater 270 operates to update the IP addressof the host. The host may be identified by a domain name that is easilyremembered. The domain name corresponds to an Internet Protocol address,but the host process may be connected to a network that utilizes dynamicIP addresses. The IP address of the server may change each time the hostdisconnects and reconnects to the network if dynamic IP addresses areused. The IP Updater 270 operates in conjunction with a Domain NameSystem (DNS) server to continually update the IP address of the hostsuch that the host's domain name will always correspond to theappropriate IP address.

An example of a computer on which the host process resides isillustrated schematically in FIG. 3. The block diagram of FIG. 3 showsthe host implemented on a personal computer 300. The host process isstored as a collection of instructions that are stored in the personalcomputer 300. The instructions may be stored in memory 304, such asRead-Only Memory (ROM) or Random Access Memory (RAM), a hard disk 306, afloppy disk to be used in conjunction with a floppy disk drive 308, or acombination of storage devices. The instructions are executed in theCentral Processing Unit (CPU) 302 and are accessed through a bus 360coupling the storage devices 304, 306, 308 to the CPU 302. The bus 360can include at least one address bus and one data bus, although multiplebuses may also be used. User input is coupled to the personal computer300 through a keyboard 310, a mouse 312 or other user input device.Images are displayed to the user through a monitor 314 that receivessignals from a video controller 316.

Video images are provided to the personal computer 300 from externalvideo sources coupled to a video capture card 320. Although any videosource may be used, a camera 322 and VCR 324 are shown in FIG. 3. Avideo switching system 330 is used to multiplex multiple video sourcesto a single video capture card 320. The video switching system 330 iscontrolled through a serial device controller 340. The host processcontrols which video source is used to supply the input by controllingthe video switching system 330. The video switching system 330 isdescribed further in the patent application previously incorporated byreference and is described below with reference to FIG. 4B.

Similarly, external audio sources are used to provide audio input to thepersonal computer 300. A microphone 352 and CD player 354 are shown asthe external audio sources, although any audio source may be used. Audiois coupled from the external audio sources 352, 354 to the host processusing an audio card 350.

The connection from the host to the network is made using a NetworkInterface Card (NIC) 360. The NIC 360 is an Ethernet card, but may besubstituted with, for example, a telephone modem, a cable modem, awireless modem or any other network interface.

FIG. 4A is a diagram illustrating a process for video capture using anapparatus such as that shown in FIG. 3. A video signal is generated inat least one video source 410. One video source may be used or aplurality of video sources may be used. A video switching system 330 isused when a plurality of video sources 410 is present. Each video sourceis connected to an input port of the video switching system 330. Thevideo switching system 330 routes one of the plurality of input videosignals to the video capture hardware 320 depending on the controlsettings provided to the video switching system 330 through a serialcommunications 340 link from the switcher 44 (see FIG. 1).

Video sources such as a VCR, TV tuner, or video camera generatecomposite video signals. The video capture hardware 320 captures asingle video frame and digitizes it when the video switching system 330routes a video source outputting composite video signals to the videocapture hardware 320. The system captures an image using an ApplicationProgram Interface (API) 420, such as Video for Windows available fromMicrosoft Corp. The API transmits the captured image to the videocapture module 430.

FIG. 4B is a flow chart illustrating the function of the video switchingmodule 330 shown in FIGS. 3 and 4A. The video subsystem maintains acache of time stamped, video images for each video-input source.Requests for data are placed on a queue in the serial communicationsmodule 340. When the video switching module 330 receives a request fromthe queue (step 452), it first determines whether the requested image isavailable (step 454). The requested image may be unavailable if, forexample, the image is in the process of being captured. If the image isnot available, the process returns to step 452 and attempts to processthe request again at step 454. If the requested image is available, theswitching module 330 determines whether the image already exists in thecache (step 456). If the image exists in the cache, the switching module330 sends the image to the requesting CGI 52 a, 52 b (see FIG. 1) andremoves the request from the queue (step 468). If the image does notexist in the cache, the switching module 330 proceeds to obtain theimage. First, it determines whether the switcher is set to the source ofthe requested image (step 458). If the switcher is set to the propersource, the image is captured and placed in the cache (step 466). Theimage is then sent to the requesting CGI and the request is removed fromthe CGI (step 468). If the switcher is not set to the proper source, theswitching module 330 causes a command to be sent to the switcher toswitch to the source of the requested image (460). Next, depending onthe video source and the capture device, optional operations may beperformed to empty pipelines in the capture device's hardware or driverimplementation (step 462). This is determined via test and interactionwith the device during installation. The switching module 330 then waitsa predetermined length of time (step 464). This delay allows the videocapture device to synchronize with the new video input stream. Therequested image is then captured and placed in the cache (step 466). Theimage is then sent to the requesting CGI, and the request is removedfrom the queue (step 468). Once the request has been removed, theswitching module 330 returns to the queue to process the next request.Although the above description relates to the switching of video inputs,it may also apply to any switching module including, for example, themultimedia switcher 44 illustrated in FIG. 1.

Audio signals are captured in a process (not shown) similar to videocapture. Audio sources are connected to multimedia audio hardware in thepersonal computer. The audio capture module makes periodic requeststhrough an API such as Windows Multimedia, available from MicrosoftCorp., for audio samples and makes the data available as a continuousaudio stream.

The host 10 (see FIG. 1) distributes the multimedia data to requestingclients once the multimedia data has been captured. As noted above, thehost is configured as a web server 50 in order to allow connections bynumerous clients runs the host multimedia distribution application.

The client 30 can be a remote hardware system that is also connected tothe network. The client may be configured to run a Java-enabled browser.The term “browser” is used to indicate an application that provides auser interface to the network, particularly if the network is the WorldWide Web. The browser allows the user to look at and interact with theinformation provided on the World Wide Web. A variety of commerciallyavailable browsers are available for computers. Similarly, compactbrowsers are available for use in portable devices such as wirelessphones and personal digital assistants. The features available in thebrowser may be limited by the available processing, memory, and displaycapabilities of the hardware device running the browser.

Java is a programming language developed especially for writingclient/server and networked applications. A Java applet is commonly sentto users connected to a particular web site. The Java archive, or Jar,format represents a compressed format for sending Java applets. In a Jarfile, instructions contained in the Java applet are compressed to enablefaster delivery across a network connection. A client running aJava-enabled browser can connect to the server and request multimediaimages.

Wireless devices may implement browsers using the Wireless ApplicationProtocol (WAP) or other wireless modes. WAP is a specification for a setof communication protocols to standardize the way that wireless devices,such as wireless phones and radio transceivers, are used for Internetaccess.

Referring to FIGS. 1 and 5A, a client 30 initially connecting via thenetwork 20 to the host makes a web request, or Type I request 512, whilelogged on a website. As used herein, the term “website” refers to one ormore interrelated web page files and other files and programs on one ormore web servers. The files and programs are accessible over a computernetwork, such as the Internet, by sending a hypertext transfer protocol(HTTP) request specifying a uniform resource locator (URL) thatidentifies the location of one of the web page files. The files andprograms may be owned, managed or authorized by a single business entityor an individual. Such files and programs can include, for example,hypertext markup language (HTML) files, common gateway interface (CGI)files, and Java applications.

As used herein, a “web page” comprises that which is presented by astandard web browser in response to an HTTP request specifying the URLby which the web page file is identified. A web page can include, forexample, text, images, sound, video, and animation.

The server performs Type I processing 510 in response to the Type Irequest 512 from the client. In Type I processing, the server opens acommunication socket, designated socket “a” in FIG. 5A, and sends a Jarto the client. The first communication socket, socket “a,” is closedonce the Jar is sent to the client. The client then extracts the Jar andruns it as a video applet once the entire Jar arrives at the clientsystem. Alternatively, the functionality of the video applet can beimplemented by software or firmware at the client.

The video applet running on the client system makes a request to theserver running on the host. The request specifies parameters necessaryfor activation of a Common Gateway Interface (CGI) necessary formultimedia distribution. The video applet request may supply CGIparameters for video source selection, frame rate, compression level,image resolution, image brightness, image contrast, image view, andother client configurable parameters. The specific parameters includedin the request can be determined by which button or link was selected aspart of the Type I request. The web page may offer a separate button orlink for each of several classes of clients. These classes refer to thecapability of clients to receive data in specific formats and atspecific rates. For example, one button may correspond to a request forthe data at a high video stream rate (30 frames per second) whileanother button corresponds to a request for the data in simple JPEG(single frame) format. Alternatively, the video applet can survey thecapabilities of the client system and select appropriate parametersbased upon the results of the survey, or the video applet can respond touser input.

The server receives the video applet request and, in response,establishes a communication port, denoted socket “b,” between the serverand the client.

The server then launches a CGI using the parameters supplied by thevideo applet request and provides client access on socket “b.” The videoCGI 530 established for the client then sends the formatted video imagestream over the socket “b” connection to the video applet running on theclient. The video applet running on the client receives the video imagesand produces images displayed at the client.

The applet may be configured to perform a traffic control function. Forexample, the client may have requested a high stream rate (e.g., 30frames per second) but may be capable of processing or receiving only alower rate (e.g., 10 frames per second). This reduced capability may bedue, for example, to network transmission delays or to otherapplications running on the client requiring more system resources. Oncea transmission buffer memory is filled, the server is unable to writefurther data. When the applet detects this backup, it submits a requestto the server for a reduced stream rate. This request for change issubmitted via, for example, a “set parameter” CGI 570, or a frame rateCGI, which is described in further detail below with reference to FIG.5B.

To detect a backup, the applet can compare a timestamp embedded in eachframe (described below with reference to FIG. 6) with the client'sinternal clock, for example. By detecting a change in the relative timebetween consecutive frames, the applet is able to recognize the backupand skip processing of delayed frames. Thus, the client proceeds toprocess the current frame rather than an old frame. For example, if theclient receives 30 frames per second and can only process one frame persecond, the applet will cause the client to process the first frame,skip the next 29 frames and process the 31st frame.

The client can also select to view only a portion of the image. Forexample, the client may select a region of the image that he wishes tomagnify. The applet allows the client to submit a request to the CGI totransmit only blocks corresponding to the selected region. By selectingonly the selected blocks, the necessary bandwidth for transmission isfurther reduced. Thus, the client can zoom to any region of the capturedimage. As a further example, the client may submit a request, via theapplet, to pan across the image in any direction, limited only by theboundaries of the captured image. The applet submits this request as achange in the requested region.

Each time a video frame or audio block is encoded in the server, it isavailable to be transmitted to the client. The video CGI 530 determines,according to the parameters passed by the video applet, whether tosubmit a request for an additional video frame and whether to send theadditional information to the client.

A similar audio CGI 560 is established using an audio applet running onthe client. Each time an audio block is encoded at the server, it isavailable to be transmitted to the client. The audio CGI 560 transmitsthe audio information to the client as a continuous stream.

The applet may be configured to perform an audio traffic controlfunction similar to that described above with respect to the video CGI530. For example, the client may have initially requested an 8-bit audiostream but may be capable of only handling a 4-bit or a 2-bit stream.

2-bit and 4-bit audio streams are encoded based on adaptive pulse codemodulation encoding (ADPCM) as described by Dialogic Corporation. The4-bit audio samples are generated from 16-bit audio samples at a fixedrate. The 2-bit audio encoder modifies the standard ADPCM by removingthe two lowest step bits, resulting in 2-bit samples from the original16-bit data. An 8-bit stream is generated by converting 16-bit samplesinto 8-bits using a μ-law encoder which is utilized in the SunMicrosystems, Inc. audio file format. This encoder is defined as theITU-T standard G.711.

When the applet detects a discrepancy between the transmitted audio dataand the capabilities of the client, it submits a request for change tothe server. The audio CGI 560 then closes the audio stream and reopensit at the appropriate data rate.

As noted above, the client determines the type of CGI that controls theinformation flowing to it on socket b by making the appropriate request.In the case of a JPEG Push CGI 540 or a Wireless Access Protocol (WAP)CGI 550, no applet is involved and no socket “b” is established. Forexample, if the client is an Internet-enabled wireless device utilizinga WAP browser, a video CGI 530 is not set up. Instead, a WAP-enableddevice requests a WAP CGI 550 to be set up at the server. Video framesare then routed to the WAP-enabled device using the WAP CGI in lieu ofthe video CGI 530 via socket “a”. The video frames are routed to theclient as JPEG files. Similarly, a JPEG Push CGI 540 is set up at theserver if the client requests JPEG Push. In response to a request by aclient, the web server 510 establishes a separate socket b connection tothe server and utilizes a separate CGI that is appropriate for itscapabilities, for that particular client.

An additional CGI that utilizes a socket is the “set parameter” CGI 570.A client may revise the parameters that control the received images andaudio by adjusting controls that are available on the video applet. Whenthe client requests a change in parameters the “set parameter” CGI 570is launched to change the parameters at the server. It can be seen thateach individual client may change the CGI settings associated with thatparticular client without affecting the images or audio being sent toany other client. Thus, each individual client has control over itsreceived multimedia without affecting the capture process running on theserver system.

FIG. 5B is a block diagram illustrating the streaming of the video databy the host to clients and the flow of commands and information betweencomponents of the host and the client. The video streaming begins whenthe client, via the remote user's web browser 505 a, sends a request(indicated by line 581) to the host server system 510. In oneembodiment, the request is an HTTP request. In response to the request,the server system 510 sends (line 582) a Jar to the client's web browser505. The Jar includes an applet that is launched by the client's webbrowser 505. Although FIG. 5B indicates the web browser 505 as havingtwo blocks 505 a, 505 b, it is understood that the two blocks 505 a, 505b only illustrate the same browser before and after the launching of theapplet, respectively. Among other functions, the applet then sends arequest to the web server 510 for the web server 510 to launch a CGI(line 583). Additionally, the applet causes the client to sendclient-specific parameters to the web server 510. In response to therequest, the web server 510 establishes a socket and launches a CGI 530according to the parameters supplied by the client and informationassociated with the socket (line 584). The CGI 530 submits periodicrequests for video information to a video encoder 525 (line 585). Thevideo encoder 525 receives JPEG-encoded video data from a video capturemodule 515 and formats the data for streaming as described, for example,below with reference to FIGS. 6 and 7 (line 586). The encoder 525responds to the requests from the CGI 530 by transmitting the encodedvideo information to the CGI 530 (line 585). The video encoder module525 and the video CGI module 530 may be sub-modules in the video CGI 52a shown in FIG. 1. The CGI 530 transmits the encoded video frames to theapplet over the established socket (line 587). The applet decodes theencoded audio frames, providing audio to the user.

As noted above, the applet may be configured to perform a trafficcontrol function. When the applet is launched on the remote viewer'sbrowser 505 b, it launches a frame-rate monitoring thread 535 (line591). The thread 535 monitors the video stream for frame delays (step545) by, for example, comparing time stamps of video frames with theclient's internal clock, as described above. As indicated in FIG. 5B,the video applet continuously checks for frame delays (line 593). When aframe delay is detected (line 594), the applet requests that the webserver 510 launch a frame-rate CGI 555. The request also submitsparameters to indicate the frame rate capabilities of the client. Theparameters are submitted to the video CGI 530 (line 595) which changesthe rate at which video is streamed to the user.

The video CGI compresses and formats the video images for streaming inorder to reduce the required network bandwidth. The video applet runningon the client extracts the video image from the compressed and encodeddata. A block diagram of the video stream format is shown in FIG. 6. Thevideo stream can be formatted in several ways with each formattransmitting separate video image information. All video stream formatsare comprised of a single six-byte header 602 followed by a number ofvideo blocks 604 a-604 nn.

The six-block header 602 is made up of a one-byte error code 610, aone-byte source 612, and a four-byte connection ID 614. The one-byteerror code 610 indicates whether an error is present in thetransmission. A zero value error code 610 indicates a successfultransmission follows. A non-zero error code indicates an error has beendetected and no data blocks will follow. The non-zero error code 610,therefore, indicates the data stream is complete. The one-byte source612 indicates the origin of the video image. A zero value source 612indicates the host as the source of the video image. A one in the source612 indicates the image is coming from a mirror site. The use of amirror site is discussed in detail below. Use of a mirror site is nototherwise detectable by the client and does not degrade the imagereceived at the client. The four-byte connection ID 614 is used todesignate the specific client. The connection ID 614 is an identifierthat is unique to each connected user.

A series of video blocks 604 follow the header 602. Different videoblock formats are used to transmit different size video images. However,all video block formats utilize a structure having a four-byte framesize field 620 followed by a four-byte block type field 622, followed byblock data fields 624.

A first type of video block 604 is defined as block type N, where Nrepresents a positive integer defining the number of image segmentsencoded in the block. A block type N format utilizes a data triplet todefine each of N video segments. Each of the N data triplets contains afour-byte X position field 632, a four-byte Y position field 634, and afour-byte width field 636. The X and Y positions define the location ofthe segment on the client screen. The width field 636 defines the widthof the video segment. The height of the video segment for the block typeN video format is preset at sixteen pixels. Thus, each of the datatriplets defines a video stripe image that is displayed on the clientscreen. Following the N data triplets, the block type N video formatutilizes a series of data blocks. A four-byte data offset field 640 isused to facilitate faster transmission of data by not transmittingidentical bytes of data at the beginning of each image. For example, twoconsecutive images may have the identical first 600 bytes of data. Thedata offset field 640 will be set to 600 and will prevent retransmissionof those 600 bytes.

A Data Size (DS) field 642 follows the data offset field 640 and is usedto define the size of the data field that follows. Two four-bytetimestamp fields 644, 646 follow the DS field 642. The first timestampfield 644 is used to timestamp the video image contained in the blocktype N image. The timestamp 644 may be used to update a timestamp thatis displayed at the client. The second timestamp field 646 is used tosynchronize the video stream with an audio stream. The contents of theDS field 642 define the number of data bytes in the data field 648 thatfollows the timestamp fields 644 and 646. The information in the datafield 648 is JPEG encoded to compress the video image. Thus, each datatriplet defines the location and width of a JPEG encoded video imagestripe. The image is a single video stripe in the image when all of thesegments are in the same Y coordinate. The initial segment 650 a is asixteen-pixel-high segment having a width defined in the first datatriplet. Similarly, subsequent segments 650 b-650 n aresixteen-pixel-high segments with widths defined by the width field 636b-636 n of the corresponding triplet.

Another video block type is denoted block type-3 and is also known as aSingle Block type. The structure of the Single Block is shown in FIG. 7.The Single Block format begins with a pair of four-byte data fields. Thefirst four-byte data field provides the initial horizontal location, X₀710. The second four-byte block provides the initial vertical location,Y₀ 712. The coordinates X₀ 710 and Y₀ 712 define the upper left cornerof the video image provided in the Single Block. A second pair offour-byte data fields follows the first pair. The second pair of datafields defines the lower right corner of the video image provided in theSingle Block. The first data field in the second pair provides the finalhorizontal position, X₁ 714, and the second data field in the pairprovides the final vertical position, Y₁ 716. A four-byte Data Offsetfield 718 follows the two pairs of coordinates. A Data Size (DS) field720 follows the Data Offset field 718 and is used to define the numberof bytes in the data field 726. Immediately following the DS field 720are two four-byte timestamp fields 722 and 724 to identify the time thevideo image was generated. The video applet running on the client canextract the timestamp information in order to overlay a timestamp on theimage. The Single Block is completed with a data field 726 consisting ofthe number of data blocks defined in the DS field 720. Thus, the SingleBlock type defines a rectangular video image spanning the coordinates(X₀, Y₀)-(X₁, Y₁).

Block type-4, also designated a Synchronization Frame, has a data formatidentical to that of the above-described Single Block. In theSynchronization Frame, the initial horizontal and vertical coordinates,X₀ and Y₀, are set to zero. Setting the initial coordinates to zeroaligns the upper left corner of the new image with the upper left cornerof the existing image. The final horizontal and vertical coordinates inthe Synchronization Frame correspond to the width of the whole image andthe height of the whole image, respectively. Therefore, it can be seenthat the Synchronization Frame can be used to refresh the entire imagedisplayed at the client. The Synchronization Frame is used during thedynamic update of the video frame rate in order to limit transmissiondelays, as described above with reference to FIG. 5B.

Block type-1 does not contain any image data within it. Rather it isused to indicate a change in the transmitted image size. The blocktype-1 format consists of a four-byte data field containing the NewWidth 740, followed by a four-byte data field containing the New Height742. The block type-1 information must be immediately followed by afull-image Single Block or Synchronization Frame.

Finally, block type-2 is designated the Error Block. The Error Blockconsists solely of a one-byte Error Code 750. The Error Block is used toindicate an error in the video stream. Transmission of the video streamis terminated following the Error Code 750.

Referring now to FIG. 8, motion detection which can be carried out bythe host will be described. Once the image has been captured into aJPEG-encoded frame, for example, the contents of a frame can further beprocessed by the main program module 46 (see FIG. 1) as follows. Datafrom subsequent video frames can be compared to determine whether theframes capture motion. FIG. 8 shows a flow chart of the motion detectionprocess. A JPEG-encoded frame is received from the video capture module40 a by the main program module 46 (see FIG. 1). The frame is firstsubdivided into a grid of, for example, 16 blocks by 16 blocks in orderto detect motion within sequential images (step 802). Motion can bedetected in each individual block. The number of blocks used tosubdivide the frame is determined by the precision with which motiondetection is desired. A large number of blocks per frame increases thegranularity and allows for fine motion detection but comes at a cost ofprocessing time and increased false detection of motion due to, forexample, jitter in the image created by the camera or minute changes inlighting. In contrast, a lower number of blocks per frame providesdecreased resolution but allows fast image processing. Additionally, theframe may be the complete image transmitted to the clients or may be asubset of the complete image. In other words, motion detection may beperformed on only a specific portion of the image. The host user maydetermine the size and placement of this portion within the completeimage, or it may be predetermined.

Once the frame has been subdivided, each block in the grid is motionprocessed (referenced in FIG. 8 as 810). Motion processing is performedon each block using comparisons of the present image with the previousimage. First, at step 812, a cross-correlation between the block beingprocessed of the current image and the corresponding block of theprevious image is calculated. In one embodiment, the cross-correlationincludes converting the captured blocks to grayscale and using the grayvalues of each pixel as the cross-correlated variable. Alternatively,the variable used for cross-correlation may be related to other aspectsof the image such as light frequency of pixels.

At step 814, the cross-correlation is then compared with a predeterminedthreshold. The predetermined cross-correlation threshold can be a staticvalue used in the motion detection process or it can be dynamic. If thecross-correlation threshold is dynamic, it may be derived from the sizeof the blocks or may be set by the host user. The host user may set thecross-correlation threshold on a relative scale where the scale isrelative to a range of acceptable cross-correlation values. Use of arelative scale allows the host user to set a cross-correlation thresholdwithout having any knowledge of cross-correlation. It may be preferablefor the cross-correlation threshold to be set higher when the block sizeis large. In contrast, a lower cross-correlation threshold may bepreferable where the block size is small and there are not many pixelsdefining the block. In addition, the cross-correlation threshold can beset in accordance with the environment in which the system operates(e.g., outdoor versus indoor) and the particular use of the motiondetection (e.g., detecting fast movement of large objects).

If, at step 814, the cross-correlation threshold is not exceeded (i.e.,the blocks are sufficiently different), the process next calculates thevariance in the brightness of the block over the corresponding block ofthe previous image (step 816). The variance is compared against avariance threshold at step 818. Again, the variance threshold may bestatic or dynamically determined. If the calculated variance falls belowthe variance threshold then no motion is indicated in the block, and theprocess continues to step 890. The block is not marked as one havingmotion. However, if the variance exceeds the variance threshold, theblock is marked as having motion at step 820, and the process continuesto step 890.

On the other hand, if the calculated cross-correlation is above thepredetermined threshold at step 814 (i.e., blocks are sufficientlysimilar), then no motion has been detected, and the process continues tostep 890. The block is not marked as one having motion. In an alternateembodiment, the brightness variance may be calculated and compared to avariance threshold. Thus, brightness variances alone may be sufficientto detect motion. However, to reduce the number of false positives, thepreferred embodiment illustrated in FIG. 8 requires both a sufficientvariance in brightness and in the cross-correlation variable.

At step 890, the routine checks to see if all blocks have beenprocessed. If all blocks have been processed, the motion detectionroutine in the main program 46 terminates (step 899) and returns theresults to the video capture module 40 a shown in FIG. 1. However, ifnot all blocks of the current image have been processed, the routinereturns to motion processing (reference 810) to analyze the next block.

FIG. 9 shows a flow chart of the motion detection process performed bythe main program 46 (see FIG. 1) on a frame level. Motion detectionrequires comparison of at least two frames, one of which is used as areference frame. Initially, a first frame is captured and used as thereference frame for determining motion detection (step not shown in FIG.9). The first step in detecting motion is capture of the current frame(step 902). Motion detection (step 800) on the block level, as describedabove with reference to FIG. 8, is performed on the captured frame usingthe initial frame as the reference. Following motion detection on theblock level (step 800), the motion detection process calculates thefraction of blocks that have motion (step 910). The calculated fractionis compared against “low,” “medium,” and “high” thresholds. Thethresholds may be static or dynamic as described above for thethresholds in the block motion detection process (step 800).

If, at step 920, the calculated fraction falls below the “low”threshold, then no motion has been detected in the frame, and thedetection process proceeds to step 990. However, if the calculatedfraction exceeds the lowest threshold then the fraction must lie withinone of three other ranges, and the process continues to step 930.

At step 930, the calculated fraction is compared against the “medium”threshold. If the calculated fraction does not exceed the “medium”threshold (i.e., the fraction is in the low-medium range), the processcontinues to step 935. At step 935, the motion detection processperforms “slight” responses. Slight responses may include transmitting afirst email notification to an address determined by the host user,sounding an audible alert, originating a phone call to a first numberdetermined by the host user, or initiating predetermined control ofexternal hardware, such as alarms, sprinklers, or lights. Anyprogrammable response may be associated with the slight responses,although advantageously, the lowest level of response is associated withthe slight response. After performing the “slight” responses, theprocess continues to step 960.

If, at step 930, the calculated fraction exceeds the “medium” threshold,the process continues to step 940. At step 940, the calculated fractionis compared against the “high” threshold. If the calculated fractiondoes not exceed the “high” threshold (i.e., the fraction is in themedium-high range), the process continues to step 945. At step 945, themotion detection process performs moderate responses. Moderate responsesmay include any of the responses that are included in the slightresponses. Advantageously, the moderate responses are associated with ahigher level of response. A second email message may be transmittedindicating the detected motion lies within the second range, or a secondpredetermined phone message may be directed to a phone number determinedby the host user. After performing the “moderate” responses, the processcontinues to step 960.

If, at step 940, the calculated fraction exceeds the “high” threshold(i.e., the fraction is in the high range), the process continues to step950. At step 950, the motion detection process performs severeresponses. Advantageously, the most extreme actions are associated withsevere responses. The severe responses may include transmitting a thirdemail message to a predetermined address, originating a phone call witha “severe” message to a predetermined phone number, originating a phonecall to a predetermined emergency phone number, or controlling externalhardware associated with severe responses. External hardware may includefire sprinklers, sirens, alarms, or emergency lights. After performingthe “severe” responses, the process continues to step 960.

At step 960, the motion detection process logs the motion and the firsttwelve images having motion regardless of the type of responseperformed. The motion detection threshold is, in this manner, used as atrigger for the recording of images relating to the motion-triggeringevent. The images are time-stamped and correlate the motion triggeringevent with a time frame. Motion detection using this logging scheme isadvantageously used in security systems or any system requiring imagelogging in conjunction with motion detection. The motion detectionprocess is done 940 once the twelve motion images are recorded. Themotion detection process may be part of a larger process such that themotion detection process repeats indefinitely. Alternatively, the motiondetection process may run on a scheduled basis as determined by anotherprocess. Although the foregoing example utilizes low, medium and highthresholds, fewer or more thresholds can be used.

Additional advantages may be realized using block motion detection inconjunction with the different image encoding formats shown in FIG. 6and FIG. 7. Transmitting a complete video image to a client requires agreat deal of network bandwidth even though the image may beJPEG-encoded. The amount of network bandwidth required to transmitimages to a client can be reduced by recognizing that subsequent datawithin an image remains the same for a majority of images. Only a smallfraction of the image may include data not previously transmitted to theclient in a previous image. The network bandwidth requirement can bereduced by transmitting only those images that change from image frameto image frame. The client is not aware that the entire image is notretransmitted each time because those blocks that are not retransmittedcontain no new information.

A process for conserving network bandwidth by transmitting only changedimage blocks is performed by the video CGI 52 a (see FIG. 1) and isshown in FIG. 10. The process begins by capturing an image (step 1010).The process then performs block motion detection 800 as described abovewith reference to FIG. 8. Additionally, at step 1020, the oldest blocksin the image, those unchanged after a predetermined number of imagecapture cycles, are marked as having changed even though they may remainthe same. Marking the oldest blocks as having changed allows the imageat the client to be refreshed over a period of time even though theremay be no new information in the image frame. At step 1030, the routethe process takes diverges depending on a chosen compression level. Thelevel of compression may be preselected by the host. Alternatively, thehost may offer the client a choice of compression levels. If lowcompression is selected, the process continues to step 1040, and theimage to be transmitted to the client is set to the full image frame.The process then constructs the appropriate header (step 1042) andcreates the JPEG image for the full image frame (step 1044). The processthen proceeds to step 1090.

When medium compression is selected at step 1030, the process firstfinds the minimum region containing changed blocks (step 1050). Thefraction of changed blocks in the minimum region is compared to apredetermined threshold at step 1052. If the fraction exceeds thepredetermined threshold, the process constructs a header (step 1042),creates a JPEG image (step 1044), and proceeds to step 1090. On theother hand, if the fraction is less than the predetermined threshold atstep 1052, the process continues to step 1060.

If high compression is selected at step 1030, the process continues tostep 1060. At step 1060, the process constructs a header and stripeimage for the changed blocks and the oldest unchanged blocks andproceeds to step 1065. At step 1065, the process creates a JPEG blocksfor the stripe image and proceeds to step 1090. At step 1090, the datais transmitted to the client.

FIG. 11 is a block diagram of one format of an audio stream. The audiostream comprises a series of audio frames 1110 that are transmitted bythe host in encoded form to the client. The encoding of an audio frameis described below with reference to FIG. 12. Additionally, the hostalso compresses the audio data to reduce the required bandwidth fortransmission. Each audio frame 1110 has a header 1120 followed by eightblocks 1121-1128 of encoded audio data.

The header 1120 of each audio frame 1110 comprises five fields. Thefirst is a host time field 1130. This four-byte field indicates the hostclock time corresponding to the audio frame. The host time field 1130allows the client to, for example, match the audio frame to thecorresponding video frame. The second field in the frame header 1120 isa one-byte bit depth field 1132. The bit depth field 1132 is followed bya two-byte frame size field 1134. The frame size field 1134 communicatesthe length of the audio frame to the client. The last two fields in theframe header 1120 contain decoder variables that correspond to themethod used to encode the audio frames. These fields include a two-byteLD field 1136 and a one-byte SD field 1138. The LD and SD fields 1136,1138 are algorithm specific variables used with the 2-bit and 4-bitADPCM audio encoders discussed above with reference to FIG. 5A.

Each block 1121-1128 in the audio frame 1110 contains a silence map 1140and up to eight packets 1141-1148 of audio data. The silence map 1140 isa one-byte field. Each of eight silence bits in the silence map field1140 corresponds to a packet of encoded audio data. The information inthe silence bits indicates whether or not the corresponding packetexists in that block 1121-1128 of the audio frame 1110. For example, thesilence map field 1140 may contain the following eight silence bits:01010101, where 1 indicates a silent packet. This silence map field 1140will be followed by only four packets of encoded audio datacorresponding to silence map bits 1, 3, 5 and 7. If the correspondingpacket does not exist (e.g., those corresponding to silence map bits 2,4, 6 and 8 in the above example), the client will insert a silencepacket with no audio data in its place. Thus, only packets withnon-silent data must be transmitted, thereby reducing the requiredbandwidth. Each packet that is transmitted after the silence map 1140consists of 32 samples of audio data.

FIG. 12 is a flow chart illustrating the encoding and generation of theaudio frame for transmission to the client. The encoding begins at step1210 with the capture of 2048 audio samples from an audio source such asa microphone, CD player or other known sources. The samples are thendigitized in packets of 32 samples each and groups the packets intoblocks, each block containing eight packets (step 1215). A group ofeight blocks then forms a frame. At step 1220, the audio CGI 52 b (seeFIG. 1) determines whether the current packet is silent. If the packetis silent, at step 1230, the silence bit in the silence mapcorresponding to the packet is set to 1. The data in the packet is notencoded, and the process continues to step 1260. If, on the other hand,the packet is not silent, the corresponding silence bit is set to 0(step 1240), and the data in the packet is encoded (step 1250). Theprocess then continues to step 1260.

After each packet is processed, the process determines whether theprocessed packet was the eighth and last packet of its block of data(step 1260). If the packet was not the last of its block, the processreturns to step 1220 and processes the next packet of 32 samples. If thepacket was the last of its block, the process writes the silence map andany non-silent packets into the block and proceeds to step 1270.

At step 1270, the process determines whether the preceding block was theeighth and last block of the audio frame. If the block was not the lastof the frame, the process returns to step 1220 to begin processing thenext block by processing the next packet of 32 samples. If the block wasthe last of the audio frame, the process writes the audio frame bywriting the header and the eight blocks. At step 1280, the audio frameis transmitted to the client.

FIG. 13 is a block diagram illustrating the broadcast of the audio databy the host to clients and the flow of commands and information betweencomponents of the host and the client. The audio broadcast begins whenthe client, via the remote user's web browser 1310 a, sends a request(indicated by line 1391) to the host server system 1320. In oneembodiment, the request is an HTTP request. In response to the request,the server system 1320 sends (line 1392) a Jar to the client's webbrowser 1310. The Jar includes an applet that is launched by theclient's web browser. Although FIG. 13 indicates the web browser 1310 ashaving two blocks 1310 a, 1310 b, it is understood that the two blocks1310 a, 1310 b only illustrate the same browser before and after thelaunching of the applet, respectively. Among other functions, the appletthen sends a request to the web server 1320 for the web server 1320 tolaunch a CGI (line 1393). Additionally, the applet causes the client tosend client-specific parameters to the web server 1320. In response tothe request, the web server 1320 establishes a socket and launches a CGI1330 according to the parameters supplied by the client and informationassociated with the socket (line 1394). The CGI 1330 submits periodicrequests for audio sample information to an audio encoder 1350 (line1395). The audio encoder 1350 receives audio samples from an audiocapture module 1340 and encodes the samples as described, for example,above with reference to FIG. 12 (line 1396). The encoder 1350 respondsto the periodic requests from the CGI 1330 by making the encoded audioinformation available to the CGI 1330 via, for example, shared memory(line 1395). The audio encoder module 1350 audio CGI module 1330 may besub-modules in the audio CGI 52 b shown in FIG. 1. The CGI 1330transmits the encoded audio frames to the applet over the establishedsocket (line 1397). The applet decodes the encoded audio frames,providing audio to the user.

FIG. 14 is a flow chart of the function of the dynamic domain namesystem (DNS) updating process performed by the IP PROC module 60illustrated in FIG. 1. The updating process begins when the host 10 (seeFIG. 1) connects to a network 20 such as the Internet. When the host 10connects to the network 20, it may be assigned a different InternetProtocol (IP) address from that which it was assigned during a previousconnection. For example, the host 10 may connect to the Internet 20through a service provider. The updating process, therefore, firstchecks to determine whether the current IP address is new (step 1410).If the IP address is unchanged, the process continues to step 1450. Onthe other hand, if the IP address is new, at step 1420, the processsends a request to a DNS host server 90 to update the IP address. TheDNS host server 90 updates the IP address corresponding to therequesting host in its database or in a DNS interface 92 of serviceprovider affiliated with the host 10 (step 1440). In response to therequest, the process receives an update from the DNS host server 90 atstep 1430. The process then proceeds to step 1450. The process isrepeated at regular intervals, such as every 2 minutes, to keep the IPaddress in the DNS host server 90 updated. When a client 30 seeks toobtain data from a host 10, the client 30 is directed to the DNS hostserver 90 which uses the updated information to direct the client 30 tothe proper host 10.

In a further embodiment, the host 10 may specify a schedule to the DNShost server 90. The schedule may indicate when the host 10 is connectedto the network 20 and is available to clients 30. If the host 10 is notavailable, the DNS host server 90 can direct a client 30 to a web pageproviding the schedule and availability of the host 10 or otherinformation. Alternatively, the DNS host server 90 can monitor when thehost 10 is not connected to the network 20. When the host 10 is notconnected to the network 20, the DNS host server 90 can direct a client30 to a web page with an appropriate message or information.

FIG. 15 is a block diagram of a system for mirroring audio and videodata streamed by the host. A mirror computer 1510 is configured with aweb server process 1520 to interface with clients 1530. In response torequests from clients 1530 made to the web server process 1520, themirror computer 1510 launches a CGI process, nph-mirr 1540, for eachrequesting client 1530. An AdMirror process 1550 running on the mirrorcomputer 1510 coordinates the mirroring of one or more host 1560. When aclient 1530 makes a request to the web server 1520 for a specific host1560, the nph-mirr process 1540 corresponding to that client 1530 causesthe AdMirror process 1550 to launch a Yowzer process 1570 for thespecific host 1560 requested by the client 1530. The Yowzer process 1570coordinates the connection of the mirror computer 1510 to the host 1560and the streaming of the video and audio data from the host 1560. If aYowzer process 1570 already exists for the specific host 1560, as mayhappen if the specific host 1560 has been previously requested byanother client 1530, an additional Yowzer process 1570 is not launched.The AdMirror process 1550 then causes the Yowzer process 1570corresponding to the requested host 1560 to interface with the nph-mirrprocess 1540 corresponding to the requesting client 1530. Thus, a singleYowzer process 1570 may support multiple nph-mirr 1540 processes andtheir corresponding clients 1530.

Each nph-mirr process 1540 functions as, for example, the CGI 52described above with reference to FIG. 1, and coordinates streaming ofdata from the host 1560 to the client 1530. Accordingly, the nph-mirrprocess 1540 sends an applet to the client 1530 and receives parametersrelated to the capabilities of the client 1530 and client's browser.Thus, the client 1530 receives streamed data at, for example, a framerate that corresponds to its capability to process the frames.

Thus, while the host 1550 streams data to the mirror computer 1510, themirror computer 1510 assumes the responsibility of streaming the data toeach of the clients 1530. This frees the host 1550 to use its processingpower for maintaining high video and audio stream rates. The mirrorcomputer 1510 may be a dedicated, powerful processor capable ofaccommodating numerous clients 1530 and numerous hosts 1550.

The foregoing description details certain embodiments of the invention.It will be appreciated, however, that no matter how detailed theforegoing appears, the invention may be embodied in other specific formswithout departing from its spirit or essential characteristics. Thedescribed embodiment is to be considered in all respects only asillustrative and not restrictive and the scope of the invention is,therefore, indicated by the appended claims rather than by the foregoingdescription. All changes which come within the meaning and range ofequivalency of the claims are to be embraced within their scope.

1. A method, implemented in hardware, of distributing media data to aclient computer via a network from a server computer, the methodcomprising: receiving a data request at a server computer from a clientcomputer via a network; receiving a client specific parameter from saidclient computer; capturing a stream of multimedia data comprising avideo signal using at least one device configured to capture said streamof multimedia data; receiving at the server computer a selection of aregion of said video signal to view on said client computer; encodingsaid stream of multimedia data in response to said client specificparameter, wherein said encoding is based on said selected region ofsaid video signal; and streaming said encoded stream to said clientcomputer from said server computer via the network according to saidclient specific parameter.
 2. The method of claim 1, wherein said clientcomputer specific parameter comprises data indicative of the processingcapability of said client computer.
 3. The method of claim 2, whereinstreaming said encoded stream comprises streaming said encoded stream ata rate compatible with the processing capability of said clientcomputer.
 4. The method of claim 1, further comprising: receiving asecond client specific parameter from a second client computer; encodingsaid stream of multimedia data in response to said second clientspecific parameter as a second encoded stream; and streaming said secondencoded stream to said second client computer from said server computervia the network according to said second client specific parameter. 5.The method of claim 4, wherein said second encoded stream is streamed tosaid second client computer while said encoded stream is streamed tosaid client computer at a different rate.
 6. The method of claim 1,wherein said client computer specific parameter is selected from thegroup consisting of video source selection, audio source selection,audio and video source selection, frame rate, compression level, imageresolution, image brightness, image contrast, and image view.
 7. Themethod of claim 1, wherein said media data comprises at least one ofaudio data and video data.
 8. The method of claim 1, further comprising:launching a delay monitoring module on said client computer; detecting achange in received stream rate at said client computer; communicating arequest to a server computer requesting an updated client specificparameter comprising a multimedia data stream rate based on the detectedchange.
 9. The method of claim 8, wherein detecting a change in receivedstream rate comprises repeatedly detecting a change at a time interval.10. The method of claim 8, wherein detecting a change in received streamrate comprises detecting a change in received stream rate that exceeds athreshold.
 11. The method of claim 1, wherein encoding said stream ofmultimedia data based on the selected region of said video signalcomprises panning said video signal.
 12. A system for distributing mediadata, the system comprising: at least one device configured to capture astream of multimedia data, said stream comprising a video signal; aserver configured to receive said stream of multimedia data from said atleast one device; and a client computer configured to communicate arequest for multimedia data to said server over a network, wherein saidrequest comprises a client specific parameter and a selection of aregion of said video signal to view on said client computer, and whereinsaid server is further configured to: encode said stream of multimediadata in response to said client specific parameter and based on saidselection; and stream said encoded stream to said client computeraccording to said client specific parameter.
 13. The system of claim 12,wherein said server is configured to provide said streaming media dataat a rate compatible with the processing capability of the clientcomputer.
 14. The system of claim 12, further comprising: a secondclient computer, wherein said server is further configured to: receive asecond client specific parameter from a second client computer; encodesaid stream of multimedia data in response to said second clientspecific parameter as a second encoded stream; and stream said secondencoded stream to said second client computer from said server computervia the network according to said client specific parameter.
 15. Thesystem of claim 14, wherein said server is configured to stream saidsecond encoded stream to said second client computer while streamingsaid encoded stream to said first client computer at a different rate.16. The method of claim 1, wherein encoding said stream of multimediadata based on the selected region of said video signal comprises zoomingsaid video signal.
 17. A method, implemented in hardware, ofdistributing media data to a client computer via a network from a servercomputer, the method comprising: receiving a data request at a servercomputer from a client computer via a network; receiving a clientspecific parameter from said client computer; capturing a stream ofmultimedia data using at least one device configured to capture saidstream of multimedia data; encoding said stream of multimedia data inresponse to said client specific parameter; streaming said encodedstream to said client computer from said server computer via the networkaccording to said client specific parameter; launching a delaymonitoring module on said client computer; detecting a change inreceived stream rate at said client computer and; communicating arequest to the server computer requesting an updated client specificparameter comprising a multimedia data stream rate based on the detectedchange.